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1 Definitions 

We introduce these scaled variables: 

• the scaled mutation rate 8 — 2N 0 fi 

• the scaled sweep rate v = 2NqV 

• the scaled selection coefficient a = 2Ns 

• the scaled time to the common ancestor of two speces r = t/2No 

• the scaled effective population size A = N/N 0 . For most of the derivation, we consider A = 1 and 
will relax this further below. 

2 General allele frequency distribution 

2.1 Neutral equilibrium allele frequency distribution 

We consider a single biallelic site with two alleles 0 and 1. We denote the frequency of allele 1 in the 
population with x. In the absence of any selective advantage, a symmetric mutational process with scaled 
rate 9 and an effective population size N, genetic drift will lead to an equilibrium distribution [3] which 
for small mutation rates can be decomposed into two terms [2]: 
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with the partial distribution 



Q a (x; 9) = ^y((l - a)(l - a:) + ax) (x(l - x)) 
and the normalization factor 

Z (9) T{6)2 

To derive the probability to observe a discrete allele count rather than a continuous allele frequency 
we model the binomial sampling process explicitly. The probability to observe k out of m individuals 
carrying allele 1 is then given as a binomial moment of the equilibrium distribution: 



M(k; to, 6>) = Q J x k {\- x) ( m - k)Q(x; 9)dx 



(1) 



which leads again to the independent partial distributions 

M a (k;m,6) = ( u ) -^^((l - o)(m - k) + ak + 26)- v 

for a = 0, 1. 



kJZ a (9) yy ' y ' ' r(l + m + 26>) 



2.2 Hitchhiking with recurrent selective sweeps 

We model recurrent selective sweeps as a Poisson process with scaled rate v. The probability that no 
sweep occurs for a time t (in scaled units of 27V generations) is then exponential: 

Prob(nosweep)(t) = e~ vt 

We approximate the expected scaled time it takes for a neutral polymorphism to reach allele frequency 
x by t — x (again in units of 2N generations) and express the partial equilibrium probability under 
recurrent sweeps approximately as 

Q a (x;6,v) = Q a (x;6)e-^ 1 -^ x + a ( 1 - x » + ]T 6(x - b)h a (b,6,v) (2) 

6={0,1} 

where the sum over the two delta distributions accounts for complete hitchhiking to the two fixed states 
b = {0, 1} and which will be given further below. 

We can again derive binomial moments to express the discrete probability distribution for sampling 
k out of to individuals with allele 1: 

w „ a \ ( m \ 2 f -v , ^r(k + 9)T(-k + m + 9) 
M 0 (fc;m,M)=y^(ae + (1 - a)) x 

aiF 1 (k + 9,m + 29, (2a - 1)) - m ~ k J" iFi(fc + 6>, 1 + to + 26, (2a - 1» 

to + 29 

+ ^2 5 k , bm h a (b,9,v) 

6={0,1} 

where the last term again account for the fraction of hitchhiking alleles and affects the two boundary 
states k = 0 and k — to. Here, \Fi(a, b, x) denotes the confluent hypergeometric function. 

The hitchhiking fraction is derived by integrating the probability to hitchhike from frequency x to 
frequency b = 0 or b = 1: 

h a {b,6,v)= / Q a (x-9)(bx+(l-b)(l-x))(l-e-^ 1 - a ^ +a ^- x ^ 



dx 

where the term bx + (1 — b)(l — x) just accounts for the fact that the allele hichhikes to b = 1 with 
probability x and to b = 0 with probability 1 — x. Using the above defined moments M a (k; to, 9) we can 
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compute this integral to: 

h a {b,6,v) =M a {b;l,( 



-2ae~ v + (1 - a) T(b + 0)T(l -b + 6) 



Z a (0) 



r(i + 20) 



-aiFi(6 + 6,1 + 29, (2a - l)v) + - -jr i-Fi(6 + 9,2 + 29, (2a - l)v) 

1 + 29 

In the following figure we plot both the continuous and the discrete neutral model with and without 
hitchhiking (parameters: /i = 0.025, v = 2): 
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As expected, the hitchhiking model predicts fewer common variants in comparison to the standard 
model. 



2.3 Selection 

We can add selection to the standard model without hitchhiking, following [2]: 



a(x; 9, a) = YjhiM 1 X ^ 1+B { l - e^ 1 -^- 1 ^) 



with the normalization factor 



m 2 



Z a (9, a) = ^ (l - e-^- a ^ 1 F 1 (e, 29, a) 

The binomial moments are derived again by integration, similar to equation 1: 

AnN 1 T(k + 9)T(-k + m + i 
M a {k;m,e,a) =| . I — r — — — - 



k J Z a {9,a) T(m + 29) 

(l - e'^-^iF^k + 9,m + 29, a] 

We can now apply the same modification using an exponential factor for hitchhiking as we did in 
equation 2, which for the equilibrium distributions under selection leads to: 

Q(x\H,a,v) = Q a (x;9,a)e-^ 1 - a)x+a{1 ^ + 5(x - b)h a {b,9,a,v) 

b={0,l} 

with the hitchhiking fraction 

^ e ( 1 -«) CT 1 F 1 (6 + 9,1 + 29, -(1 - a)v + av) - x Fi(b + 9,1 + 29, s - (1 - a)v + av)^j 
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and the binomial moments 

r e ( 1 - a )°' 1 F 1 (fc + 9,m + 20,-(l - a)v + av) - 1 F 1 (k + 6,m + 29, a - (1 - a)v + av)j 

+ S k,bmh a (b,Q,<T,v) 
6={0,1} 

We plot again both the continuous and the discrete neutral model with and without hitchhiking, but 
including selection (same parameters as above but a = 2) 
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As expected, due to directional selection, allele frequencies are skewed towards the B- allele (i.e. higher 



2.4 Substitutions and divergence from outgroup 

To model the divergence from a related species, we exploit the fact that we generally consider relatively 
small mutation rates, which leads to a separation of the substitution (fixation) time scale from the 
polymorphism time scale [1]. In this regime, which corresponds to 9 -C 1, we can model fixations 
independently from polymorphisms as a Poisson process. Without hitchhiking and in the weak mutation 
regime, the rate per scaled unit time of this process is approximately [2] 



u(0,a) 



9o 



1 - e- 



Under hitchhiking with an effective rate v (see above) , we have previously shown [4] that the substi- 
tution rate changes to approximately 



u(6, cr, v) = 



^,(7)2^(1,^,1 + 1,1 



\<r\+u 



for rj > 0 
for a < 0, 



which effectively reduces the rate of beneficial mutations, and increases the rate of deleterious mutations, 
making them more neutral-.- 

We abbreviate u + = u(6, +<j, v) and it_ = u{6, —a, v) and will omit the dependencies on 9, a and 
which are always implicit in the following expressions. The rates u+ and u_ form a two state Markov 
process with states a = {0, 1} and a rate matrix 



R 



and transition probability 

T(r) = exp(Rr) 



0 

-u+ 



u+e 



1 



(3) 



-t(ji + +u_) 



u + \u + —u + e T ( u ++ U -) u + + u_e 



u _ _ M _ e -r(«++u_) 
(ti + +u_) 
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The equilibrium state A for this transition probability is 

A=^— («+ 

where we again left out the dependency on 9, a and v for brevity. 

This transition probability lets us write the probability to observe a frequency k\ out of vn\ samples 
with allele B in species 1, and k 2 out of m 2 samples in species 2, where both species share a common 
ancestral species at time r in the past: 

ill 

M(h,k 2 ; m 1 ,m 2 ,T)=J2Yl Wa(T(r)) 0l , a (T(r)) a2 , a M Ol {k 1 ;m 1 )M a2 (fc 2 ; m 2 ). 

a— 0 ai— 0 a2— 0 

The indices in the matrix T and the vector A denote row and column, respectively. This expression 
makes use of the fact that the polymorphisms dynamics (reflected by M a (k; m)) are independent of the 
substitution dynamics (reflected by T(r)) in the weak mutation regime set by 9 -C 1. 

The outgroup-directed allele frequency as used in the data analysis is now simply a sum over the two 
cases in which the single outgroup-sample carries either of the two alleles: 



P(k; m, r, 9, a, v) = M(k, k'; m, 1, r, 9, a, v). 



fc'=0 



This is the most general allele frequency probability distribution, on which all models for parameter 
estimation as described in Material and Methods are based on. The following figure shows this probability 
for 9 = 0.025 and r = 5 and different values for a and v as indicated: 
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Note that this probability is independent with respect to the sign of a. The reason is that we compute 
the difference in the two species, without direction of an ancestral vs. derived allele. If a is negative, it 
means that the A allele is the fitter one, but the probability to observe k out of m samples with a different 
allele than the outgroup is the same if B was the fitter allele. We therefore treat a as a parameter on 
the positive domain of real values. 

In all of the above we have considered N — No, i.e. A = 1 (see section 1). We can generalize by 
scaling all scaled parameters by A: 

l 

P(k; m, t, 9, a, v, A) = V M{k,k';m,l,T/K,K9,Ka,Ky). 

k'=0 



3 Models 

3.1 Basic Models for synonymous sites 

To work with real data and to infer parameters reliably, we define simplified subsets of the full model, 
by setting some parameters to default values. In particular, we define these basic models: 
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Unlinked adaptation model: The unlinked model has as free parameters only the scaled mutation 
rate 9 and the divergence time r. Other parameters are fixed, so that the resulting probability can be 
written as: 

-Punlinkcd(fc; m, T, 9) = P(k) TO, T, 9, 0, 0, 1). 

Background selection model: The background selection model (BGS) has as additional free 
parameter the effective population size A: 

PBGs(k; m, r, 9, A) = P(k; to, t, 9, 0, 0, A) 

Linked Adaptation model: With hitchhiking, we use one further parameter v. 

■PiinkcdO; m, t, 9, v, A) = P(k; to, t, 9, 0, v, A) 

Directional selection model: We also define a model with background selection and direct selection 
(see Supplementary Figures S2 and S3): 

P scl (fc; to, t, 9, a, A) = P(k; to, t, 9, a, 0, A) 

3.2 Mixed model for heterogeneous data sets 

For our mixed model we add together these components with different weights: 

• Neutral component: A fraction c„ of sites evolves neutrally, but generally under hitchhiking. 

• Weakly selected component: A fraction c w of sites evolves under weak directional selection. 

• Adaptive component: At a fraction c a of sites we assume that adaptive evolution has generated fixed 
differences between the two species. This fraction accounts for an observed surplus of substitutions 
with respect to the neutral expectation. 

• Conserved component: The remainder of the above, with fraction c c = 1 — c„ — c w — c a is assumed 
to be under strong directional selection which accounts for an observed surplus of conserved sites 
with respect to the neutral expectation. 

Each component has its own specific outgroup-directed allele frequency distribution. First, the neutral 
component is simply one of the above derived basic models without selection: 



P n (k;m,T,9,v, A) 



Punlinkcd(k;m,T,9) 

P BGS (k;m,T,9,X) (4) 

^Plinked{k]m,T, 9, V, A), 



defining three separate mixed models. The weakly selected component uses the full probability derived 
above, including a selection coefficient a, which we typically constrain to 1 < a < 150: 

P w = P(k;m,T,9,a,v,X). (5) 

The adaptive component is only a surplus of fixed differences between the two species, so it is simply 

P a (k;m) = 5 k , m (6) 

with the Kronecker-Delta which sets this probability to zero everywhere except at k — to where it is one. 
Analogously, the conserved component is: 



P c (k; to) 



>k.O 



Note that each of these components is normalized across 0 < k < to. 
The full probability of the mixed model is then: 

P(k; to, 9) = c n P n (k; to, t, 9, v, A) + c w P n (k; m,r, 9, a, v, A) + c a P a (k; m) + (1 - c n - c w - c a )P c (k; to). 

This full model has 7 parameters 0 = {t, 9, a, c n ,c Wl c a }. 
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3.3 Maximum Likelihood estimation 



We consider a data set of outgroup-directed allele frequencies with a fixed sample size m. We denote 
the number of sites with allele frequency k by n k . The total likelihood of the data given parameters 6 
is then: 

m 

£(n k ;m,G) = ]J P(k; m, 8)" fc . 

fe=0 

In practice we use the log-Likelihood 

m 

\og£(n k ;m, 8) = n k log P(k;m, 8). 

fc=0 

The parameters are then learned by maximization of the log-Likelihood: 

8 = argmax e / \og£(n k ; m, 8'). 

As pointed out in the text, we typically follow a hierarchical protocol to estimate parameters from 
data. Assuming, all sites have been binned according to the local recombination rate (see Methods in the 
main article), we first use the unlinked model to infer r (under free variation of 9) from synonymous sites 
in the highest recombination bin. We then use the background selection and linked adaptation models 
to infer 9, A and v for each bin, keeping t fixed at the value inferred from the highest recombination 
bin using the unlinked model. We then learn the rest of the parameters from non-neutral annotation 
categories using the mixed models, keeping the neutral parameters fixed to their values obtained from 
the background selection or the linked adaptation model. Numerical maximization is implemented using 
Powell's method [5]. 

3.4 Types of substitutions in the mixed model 

In the mixed model, we implemented different components which contribute differently to the amount 
of fixed differences between species. We make use of the three components P n , P w and P a as defined in 
equations 4, 5, 6. First, we can estimate the fraction of sites with neutral substitutions that have been 
fixed by drift alone: 

/drift = c n P n (k = m;m,T,9,v = 0,\)e~ v , 

where e~ v is the probability that no linked sweep occurs during the typical time of fixation of a neutral 
variant (2N 0 generations). We also define the fraction of sites with neutral substitutions fixed by drift 
and by hitchhiking: 

/drift+HH = c n P n (k = m; m, r, 0, v, A). 
From these two, we obtain the hitchhiking fraction via: 

/HH = /drift+HH — /drift- 

We obtain the fraction of sites with weakly selected substitutions that have been fixed by drift via: 

/sol, drift = c w P w (k = m; m, t, 6,a,v = 0, A)e~", 

and the fraction of sites with weakly selected substitutions fixed by drift and hitchhiking 

/sei,drift+HH = c w P(k = m; m, t, 9, a, v, A), 

which allows us to derive the fraction of sites with weakly selected substitutions fixed by deleterious 
hitchhiking: 

/sel.HH = /sel,drift+HH — /scl, drift- 

We can also write down the fraction of sites with adaptive substitutions: 

/adaptive — c a • 
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4 Fitness Flux 



Fitness flux was introduced in [2] as the product of the rate of substitutions and their average selection 
coefficient. To estimate fitness flux, we first compute the rate of adaptive substitutions per scaled unit 
time, k a , from our fraction f a = c a above: 

k 

ka ~ 2r' 

because 2r is the total branch length between the two species, and c a is the expected number of sub- 
stitutions per site. If we knew the average selection coefficient of adaptive substitutions, s a , the fitness 
flux $ would be 

* = k a s a , (7) 

per scaled unit time. We cannot measure s a directly from our framework, but we have an indirect 
measure using the rate of linked sweeps, v. As has been shown in [6] and [7], the typical "effect width" 
of a single selective sweep with selection coefficient s a is 

Sa 

w = a — , 
r 

where r is the recombination rate, and a is a constant, which depends on model assumptions and 
parameters such as the absolute effective population size. For relevant parameters, a lies between 0.1 
and 0.5, as computed in [7]. Here we fix a = 0.1 to be conservative in estimating fitness flux and s a as 
follows: We can now relate the effective rate of linked sweeps, v, which is simply to total rate of adaptive 
mutations within a window of width w, with the fitness flux (see also [2]): 

v = wk a > 0.1fc a — = 0.1-, 
r r 

So the rate of linked sweeps v is directly linked to the fitness flux per recombination map length. In 
case we do not observe any positive rate of linked drivers, we estimate a conservative upper bound on 
the fitness flux using the inequality v < 1 (with v = 1 being a typical value that we can measure with 
confidence). In summary, we compute fitness flux as: 



< lOz^r if v > 1 

< lOr if v < 1 



as an upper bound on fitness flux in regions with r > 0. Having estimated $, we can simply solve for s a 
using equation 7. 

Similarly to the rate of beneficial mutations, we can estimate the total rate of weakly deleterious 
substitutions via 

, /sel,drift+HH 
kd= ^T ' 

which then lets us estimate a negative component to the fitness flux as 

, k d a 

<f>_ = , 

2 2iV 0 ' 

where a is the scaled selection coefficient as used in the mixed model, and the factor 1/2 accounts for 
the fact that in equilibrium only half of the substitutions are deleterious, the other half is compensatory 
and hence beneficial. We used a diploid population size of 1.78 x 10 6 , as used in [8] and [9]. We show 
this estimate in Supplementary Figure 6c). In the definitions above, fitness flux is defined in units of 
1/2N 0 . However, it is intuitive to use as units fi/2N 0 , with fi estimated from our estimates of 9 and the 
above population size. In these units, a fitness flux of 1 can for example be generated by mutations with 
selection coefficients of 1/2 AT, which fix with the neutral substitution rate (fi). We use these units in 
Figure 5. 

To report a fitness flux per million years, we assume a generation time of 0.1 years in Drosophila, 
which yields a neutral substitution rate of one substitution per 0.1/^i« 30 x 10 6 years. 

We also translate these rates to fitness flux per gene, for which we use an average number of 1,000 
nonsynonymous sites per gene in the autosomes in Drosophila, obtained from the annotations described 
in methods. 
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5 Genetic Load 

Genetic load as used here quantifies the amount of fitness loss generated by fixed deleterious mutations. 
The probability for a weakly selected site to be in the less fit state follows from the equilibrium state of 
the Markov process defined in equation 3: 



u + + u_ ' 

where the fixation rates u + and it_ are described in section 2.4. The genetic load is then simply: 

I = (TC tu A_ 

where c w is the fraction of weakly selected sites, and a is their selection coefficient, as defined in the 
mixed model. 
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